N-best list generation using word and phoneme recognition fusion
نویسندگان
چکیده
This paper describes an approach for combining phoneme and word recognition to produce an accurate N-best list of hypotheses. We run two decoding threads in parallel. The first performs phoneme recognition, while the other performs word recognition on the same recorded utterance. The output of the word recognition thread is returned as the most likely hypothesis, and the result of the phoneme recognition thread is used to lookup a list of words for the rest of the N-best list. The algorithm is simple to implement and efficient. In our evaluation, we found that this approach has similar performance to the classical lattice-based N-best search methods on isolated word recognition. This method has the potential to improve existing ASR systems or can be used in interactive multi-modal applications.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملJoint-sequence models for grapheme-to-phoneme conversion
Grapheme-to-phoneme conversion is the task of finding the pronunciation of a word given its written form. It has important applications in text-to-speech and speech recognition. Joint-sequence models are a simple and theoretically stringent probabilistic framework that is applicable to this problem. This article provides a selfcontained and detailed description of this method. We present a nove...
متن کاملEmpirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems
Diversity is crucial to reducing the word error rate (WER) when fusing multiple automatic speech recognition (ASR) systems. We present an empirical analysis linking diversity and fusion performance. We transcribed speech from the first 2012 US Presidential debate using multiple ASR systems trained with the Kaldi toolkit. We used the N-best ROVER algorithm to perform hypothesis fusion and measur...
متن کاملFuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملReaction Time in Phoneme Recognition: A Comparative Study among Iranian Upper-Intermediate vs. Advanced EFL Learners at Institute Level
The present study aimed to investigate of reaction time in terms of phoneme recognition: A comparative study among Iranian Upper-Intermediate vs. Advanced EFL Learners at Institute level. The main question this study tried to answer was whether there is no difference in reaction time in terms of phoneme recognition in Iranian learners at Institute level. To answer the question, 5Upper-Intermedi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001